AITopics | temperature parameter

6cdb2cbb2083477cca5243843d6dad06-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 17:56:49 GMT

artificial intelligence, likelihood, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

926ffc0ca56636b9e73c565cf994ea5a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 23:02:17 GMT

We thank the reviewers for their valuable comments. We are glad that reviewers noted our paper as novel (R1: "idea is "Decouple the effect of capacity increase and curriculum learning": We would like to We will also move related works section as suggested. We agree that this issue is important in the field of curriculum learning. "It could be interesting to show results on the large W ebVision Benchmark": "W ould proposed curriculum change robustness to adversarial attacks": On average, our method requires 20 % fewer epochs. ImageNet, we conducted new experiments on WebVision dataset (2.3 million training images) and obtain significant Please see the first table above.

artificial intelligence, curriculum, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function

Pham, Tuan Minh, Cao, Thinh, Nguyen, Viet, Nguyen, Huy, Ho, Nhat, Rinaldo, Alessandro

arXiv.org Machine LearningFeb-3-2026

The sigmoid gate in mixture-of-experts (MoE) models has been empirically shown to outperform the softmax gate across several tasks, ranging from approximating feed-forward networks to language modeling. Additionally, recent efforts have demonstrated that the sigmoid gate is provably more sample-efficient than its softmax counterpart under regression settings. Nevertheless, there are three notable concerns that have not been addressed in the literature, namely (i) the benefits of the sigmoid gate have not been established under classification settings; (ii) existing sigmoid-gated MoE models may not converge to their ground-truth; and (iii) the effects of a temperature parameter in the sigmoid gate remain theoretically underexplored. To tackle these open problems, we perform a comprehensive analysis of multinomial logistic MoE equipped with a modified sigmoid gate to ensure model convergence. Our results indicate that the sigmoid gate exhibits a lower sample complexity than the softmax gate for both parameter and expert estimation. Furthermore, we find that incorporating a temperature into the sigmoid gate leads to a sample complexity of exponential order due to an intrinsic interaction between the temperature and gating parameters. To overcome this issue, we propose replacing the vanilla inner product score in the gating function with a Euclidean score that effectively removes that interaction, thereby substantially improving the sample complexity to a polynomial order.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2602.01466

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 05:32:43 GMT

assumption 2, lyapunov function, opponent, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.93)

Add feedback

Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering

Zacharopoulos, Christos-Nikolaos, Kyriakoglou, Revekka

arXiv.org Artificial IntelligenceNov-7-2025

As Large Language Models (LLMs) become integral to human-centered applications, understanding their personality-like behaviors is increasingly important for responsible development and deployment. This paper systematically evaluates six LLMs, applying the Big Five Inventory-2 (BFI-2) framework, to assess trait expressions under varying sampling temperatures. We find significant differences across four of the five personality dimensions, with Neuroticism and Extraversion susceptible to temperature adjustments. Further, hierarchical clustering reveals distinct model clusters, suggesting that architectural features may predispose certain models toward stable trait profiles. Taken together, these results offer new insights into the emergence of personality-like patterns in LLMs and provide a new perspective on model tuning, selection, and the ethical governance of AI systems. We share the data and code for this analysis here: https://osf.io/bsvzc/?view_only=6672219bede24b4e875097426dc3fac1

creativity, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2511.04499

Country: Europe > France (0.14)

Genre:

Research Report > Experimental Study (0.71)
Research Report > New Finding (0.71)

Industry: Health & Medicine (0.96)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Revisiting Logistic-softmax Likelihood in Bayesian Meta-learning for Few-shot Classification

Neural Information Processing SystemsOct-8-2025, 20:58:13 GMT

Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax.

artificial intelligence, likelihood, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Add feedback

926ffc0ca56636b9e73c565cf994ea5a-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 05:48:54 GMT

artificial intelligence, curriculum, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Privileged Self-Access Matters for Introspection in AI

Song, Siyuan, Lederman, Harvey, Hu, Jennifer, Mahowald, Kyle

arXiv.org Artificial IntelligenceAug-21-2025

Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.14802

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 06:13:43 GMT

artificial intelligence, assumption 2, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.93)

Add feedback

Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety-Efficiency Trade-Offs

Ji, Chang-Hun, Song, SiWoon, Han, Youn-Hee, Moon, SungTae

arXiv.org Artificial IntelligenceJul-31-2025

A drone trajectory planner should be able to dynamically adjust the safety-efficiency trade-off according to varying mission requirements in unknown environments. Although traditional polynomial-based planners offer computational efficiency and smooth trajectory generation, they require expert knowledge to tune multiple parameters to adjust this trade-off. Moreover, even with careful tuning, the resulting adjustment may fail to achieve the desired trade-off. Similarly, although reinforcement learning-based planners are adaptable in unknown environments, they do not explicitly address the safety-efficiency trade-off. To overcome this limitation, we introduce a Decision Transformer-based trajectory planner that leverages a single parameter, Return-to-Go (RTG), as a \emph{temperature parameter} to dynamically adjust the safety-efficiency trade-off. In our framework, since RTG intuitively measures the safety and efficiency of a trajectory, RTG tuning does not require expert knowledge. We validate our approach using Gazebo simulations in both structured grid and unstructured random environments. The experimental results demonstrate that our planner can dynamically adjust the safety-efficiency trade-off by simply tuning the RTG parameter. Furthermore, our planner outperforms existing baseline methods across various RTG settings, generating safer trajectories when tuned for safety and more efficient trajectories when tuned for efficiency. Real-world experiments further confirm the reliability and practicality of our proposed planner.

large language model, machine learning, trajectory, (21 more...)

arXiv.org Artificial Intelligence

2507.21506

Genre: Research Report > New Finding (0.66)

Industry: